PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195
Open
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
Open
PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
Conversation
Add period_ordinals_from_fields Cython function that converts arrays of year/month/day/hour/minute/second fields to period ordinals in a single C-level loop, replacing the Python-space list-append loop in _range_from_fields. Reuse the same function in to_datetime's _assemble_from_unit_mappings with freq=FR_US to construct datetime64[us] directly from field arrays, avoiding the object-dtype round-trip through ensure_object + array_strptime with format="%Y%m%d". Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
b9f6593 to
3dba4f6
Compare
Replace nonlocal-based fractional tracking with a tuple return; fold the six per-field conversion calls into a single loop over field_spec that also drives the NaN-mask default-fill step. Replace the "rerun through %Y%m%d strptime just to borrow its error message" fallback with a direct ValueError naming the offending column. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
period_ordinals_from_fieldsCython function that converts arrays of date/time fields to period ordinals in a single C-level loop, with optional date validation_range_from_fieldsto call the new Cython function instead of looping in Python-space and appending to a list; vectorize the quarter-to-calendar-month conversion with numpy ops_assemble_from_unit_mappingswithfreq=FR_USto constructdatetime64[us]directly from field arrays, avoiding the object-dtype round-trip throughensure_object+array_strptimewithformat="%Y%m%d"PeriodIndex.from_fields(2k monthly)PeriodIndex.from_fields(100k monthly)to_datetime(DataFrame)100k unique datesto_datetime(DataFrame)100k repeated datesThe old
to_datetime(DataFrame)path relied on_maybe_cachefor repeated values but degraded to ~15ms with unique dates due to per-elementstr()+ strptime. The new path is uniformly fast.Test plan
pandas/tests/indexes/period/test_constructors.py(108 + 3 new tests pass)pandas/tests/tools/test_to_datetime.py(939 + 9 new tests pass)pandas/tests/indexes/period/(466 tests pass)pandas/tests/arrays/period/(40 tests pass)New tests cover: non-DEC quarter fiscal year, all-6-field hourly periods, empty arrays, leap-year Feb 29 validation, invalid day-of-month (raise + coerce), fractional float coerce, empty DataFrame, UTC with time fields.
🤖 Generated with Claude Code